After running a regression model of some sort, the common way of interpreting the relationships between predictors and the outcome is by interpreting the regression coefficients. In many situations these interpretations are straightforward, however, in some settings the interpretations can be tricky, or the coefficients can simply not be interpreted.
These complications arise most commonly when a model involves an interaction or moderation. In these sort of models, interpretation of the coefficients requires special care due to the relationship between the various interacted predictors, and the set of information obtainable from the coefficients is often limited without resorting to significant algebra.
Marginal effects offer an improvement over simple coefficient interpretation. Marginal means allow you to estimate the average response under a set of conditions; e.g. the average response for each racial group, or the average response when age and weight are at certain values.
Marginal slopes estimate the slope between a predictor and the outcome when other variables are at some specific value; e.g. the average slope on age within each gender, or the average slope on age when weight is at certain values.
In either case, in addition to estimating these marginal effects, we can easily test hypotheses of equality between them via pairwise comparisons. For example, is the average response different by ethnicity, or is the slope on age different between genders.
If you are familiar with interpreting regression coefficients, and specifically interactions, you may have some idea of how to start addressing all the above examples. However, to complete answer these research questions would require more than simply a table of regression coefficients. With marginal effects, one additional set of results can address all these questions.
Finally, marginal effect estimation is a step towards creating informative visualizations of relationships such as interaction plots.
The webuse
command can load the directly.
. webuse margex
(Artificial data for margins)
The marginaleffects package in R doesn’t like a variable
being named “group”, so we rename it.
. rename group class
. summarize, sep(0)
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
y | 3,000 69.73357 21.53986 0 146.3
outcome | 3,000 .1696667 .3754023 0 1
sex | 3,000 .5006667 .5000829 0 1
class | 3,000 1.828 .7732714 1 3
age | 3,000 39.799 11.54174 20 60
distance | 3,000 58.58566 181.3987 .25 748.8927
ycn | 3,000 74.47263 22.01264 0 153.8
yc | 3,000 .2413333 .4279633 0 1
treatment | 3,000 .5006667 .5000829 0 1
agegroup | 3,000 2.263667 .8316913 1 3
arm | 3,000 1.970667 .7899457 1 3
The haven package is used to read the data from Stata
into R.
library(haven)
m <- read_dta("http://www.stata-press.com/data/r15/margex.dta")
m <- data.frame(m)
m$sex <- as_factor(m$sex)
names(m)[names(m) == "group"] <- "class"
m$class <- as_factor(m$class)
head(m) y outcome sex class age distance ycn yc treatment agegroup arm
1 102.6 1 female 1 55 11.75 97.3 1 1 3 1
2 77.2 0 female 1 60 30.75 78.6 0 1 3 1
3 80.5 0 female 1 27 14.25 54.5 0 1 1 1
4 82.5 1 female 1 60 29.25 98.4 1 1 3 1
5 71.6 1 female 1 52 19.00 105.3 1 1 3 1
6 83.2 1 female 1 49 2.50 89.7 0 1 3 1
The marginaleffects package doesn’t like a variable
being named “group”, so we rename it.
There are several packages which can be used to estimate marginal
means. So far, this document implements emmeans and
ggeffects.
emmeans is based on an older package,
lsmeans, and is more flexible and powerful. However, it’s
syntax is slightly strange, and there are often several ways to obtain
similar results which may or may not be the same. Additionally, while
it’s immediate output is clear, accessing the raw data producing the
output takes some doing.
ggeffects is a newer package designed to tie into the
dplyr/ggplot universe. It’s results are always a clean data frame which
can be manipulated or passed straight into ggplot. However,
ggeffects can only estimate marginal means, not
marginal slopes or pairwise comparisons. (If I’m wrong about that last
statement, please let me know.)
In addition, the interactions package is used to plot
interaction effects.
ggeffectsThe ggeffects package requires several other packages to
work fully. However, it does not follow proper package framework and
install them for you, rather it requires you to ensure you have the
following packages installed as well:
effectsemmeansggplot2If you do not have those packages loaded, ggeffects will
load them on demand (rather than at the time of loading
ggeffects). If you do not have a required package
installed, ggeffects will instead produce a warning, so
keep an eye out for those and install packages as needed.
This document was last compiled using Stata/SE 17.0 and R 4.2.2.
R package versions at last compilation:
| Package | Version |
|---|---|
emmeans |
1.8.5 |
ggeffects |
1.2.0 |
effects |
4.2.2 |
ggplot2 |
3.4.1 |
interactions |
1.1.5 |
. regress y i.class
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(2, 2997) = 15.48
Model | 14229.0349 2 7114.51743 Prob > F = 0.0000
Residual | 1377203.97 2,997 459.527518 R-squared = 0.0102
-------------+---------------------------------- Adj R-squared = 0.0096
Total | 1391433.01 2,999 463.965657 Root MSE = 21.437
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
class |
2 | .4084991 .8912269 0.46 0.647 -1.338979 2.155977
3 | 5.373138 1.027651 5.23 0.000 3.358165 7.38811
|
_cons | 68.35805 .6190791 110.42 0.000 67.14419 69.57191
------------------------------------------------------------------------------
. margins class
Adjusted predictions Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
class |
1 | 68.35805 .6190791 110.42 0.000 67.14419 69.57191
2 | 68.76655 .6411134 107.26 0.000 67.50948 70.02361
3 | 73.73119 .8202484 89.89 0.000 72.12288 75.33949
------------------------------------------------------------------------------
emmeans)
Call:
lm(formula = y ~ class, data = m)
Residuals:
Min 1Q Median 3Q Max
-73.531 -14.758 0.101 14.536 72.569
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 68.3580 0.6191 110.419 < 2e-16 ***
class2 0.4085 0.8912 0.458 0.647
class3 5.3731 1.0277 5.229 1.83e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.44 on 2997 degrees of freedom
Multiple R-squared: 0.01023, Adjusted R-squared: 0.009566
F-statistic: 15.48 on 2 and 2997 DF, p-value: 2.045e-07
class emmean SE df lower.CL upper.CL
1 68.4 0.619 2997 67.1 69.6
2 68.8 0.641 2997 67.5 70.0
3 73.7 0.820 2997 72.1 75.3
Confidence level used: 0.95
Note that by default, emmeans returns confidence intervals.
If you prefer p-values, you can wrap the emmeans call in
test:
class emmean SE df t.ratio p.value
1 68.4 0.619 2997 110.419 <.0001
2 68.8 0.641 2997 107.261 <.0001
3 73.7 0.820 2997 89.889 <.0001
ggeffects)
Call:
lm(formula = y ~ class, data = m)
Residuals:
Min 1Q Median 3Q Max
-73.531 -14.758 0.101 14.536 72.569
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 68.3580 0.6191 110.419 < 2e-16 ***
class2 0.4085 0.8912 0.458 0.647
class3 5.3731 1.0277 5.229 1.83e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.44 on 2997 degrees of freedom
Multiple R-squared: 0.01023, Adjusted R-squared: 0.009566
F-statistic: 15.48 on 2 and 2997 DF, p-value: 2.045e-07
# Predicted values of y
class | Predicted | 95% CI
----------------------------------
1 | 68.36 | [67.14, 69.57]
2 | 68.77 | [67.51, 70.02]
3 | 73.73 | [72.12, 75.34]
Note that we could have run that as
ggeffect(mod1, ~ class), with a formula. However, when we
later want to specify certain values of the predictors, we need to use
the character version, so I stick with it here.
marginaleffects)
Call:
lm(formula = y ~ class, data = m)
Residuals:
Min 1Q Median 3Q Max
-73.531 -14.758 0.101 14.536 72.569
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 68.3580 0.6191 110.419 < 2e-16 ***
class2 0.4085 0.8912 0.458 0.647
class3 5.3731 1.0277 5.229 1.83e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.44 on 2997 degrees of freedom
Multiple R-squared: 0.01023, Adjusted R-squared: 0.009566
F-statistic: 15.48 on 2 and 2997 DF, p-value: 2.045e-07
Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % class
68.4 0.619 110.4 <0.001 67.1 69.6 1
68.8 0.641 107.3 <0.001 67.5 70.0 2
73.7 0.820 89.9 <0.001 72.1 75.3 3
Columns: rowid, estimate, std.error, statistic, p.value, conf.low, conf.high, y, class
Term Value Mean Std. Error z Pr(>|z|) 2.5 % 97.5 %
class 1 68.4 0.619 110.4 <0.001 67.1 69.6
class 2 68.8 0.641 107.3 <0.001 67.5 70.0
class 3 73.7 0.820 89.9 <0.001 72.1 75.3
Results averaged over levels of: class
Columns: rowid, term, value, class, estimate, std.error, statistic, p.value, conf.low, conf.high, wts, y
The marginalmeans() function is provided as a direct
translation for emmeans users. The
predictions() function will be used more commonly so I
recommend becoming familiar with that.
The datagrid() function can be frustrating - putting in
“wrong” values produces no error/warning and just produces unexpected
results. For example,
Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % class
68.4 0.619 110.4 <0.001 67.1 69.6 1
68.8 0.641 107.3 <0.001 67.5 70.0 2
73.7 0.820 89.9 <0.001 72.1 75.3 3
Columns: rowid, estimate, std.error, statistic, p.value, conf.low, conf.high, y, class
is an alternative to class = unique to specify the
levels. (NOTE: These are the “levels” not the values - it just happens
that class has the same. See below for what happens when it
differs.) We can specify a subset of levels as well.
Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % class
68.4 0.619 110 <0.001 67.1 69.6 1
68.8 0.641 107 <0.001 67.5 70.0 2
Columns: rowid, estimate, std.error, statistic, p.value, conf.low, conf.high, y, class
However, what if we included a level that wasn’t found in the model?
For example, if we thought class took on values 0-2 instead
of 1-3?
Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
68.4 0.619 110.4 <0.001 67.1 69.6
68.4 0.619 110.4 <0.001 67.1 69.6
68.4 0.619 110.4 <0.001 67.1 69.6
68.4 0.619 110.4 <0.001 67.1 69.6
68.4 0.619 110.4 <0.001 67.1 69.6
--- 2990 rows omitted. See ?avg_predictions and ?print.marginaleffects ---
73.7 0.820 89.9 <0.001 72.1 75.3
73.7 0.820 89.9 <0.001 72.1 75.3
73.7 0.820 89.9 <0.001 72.1 75.3
73.7 0.820 89.9 <0.001 72.1 75.3
73.7 0.820 89.9 <0.001 72.1 75.3
Columns: rowid, estimate, std.error, statistic, p.value, conf.low, conf.high, y, class
In this case, the newdata argument (the second argument
which datagrid()’s results are passed to) is ignored
without warning, and we get identical results to if we’d just called
predictions(mod1).
If your factor has labels which are not just the values, you need to refer to those.
Factor w/ 2 levels "male","female": 2 2 2 2 2 2 2 2 2 2 ...
Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
74.9 0.54 139 <0.001 73.8 75.9
74.9 0.54 139 <0.001 73.8 75.9
74.9 0.54 139 <0.001 73.8 75.9
74.9 0.54 139 <0.001 73.8 75.9
74.9 0.54 139 <0.001 73.8 75.9
--- 2990 rows omitted. See ?avg_predictions and ?print.marginaleffects ---
64.6 0.54 119 <0.001 63.5 65.6
64.6 0.54 119 <0.001 63.5 65.6
64.6 0.54 119 <0.001 63.5 65.6
64.6 0.54 119 <0.001 63.5 65.6
64.6 0.54 119 <0.001 63.5 65.6
Columns: rowid, estimate, std.error, statistic, p.value, conf.low, conf.high, y, sex
Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % sex
64.6 0.54 119 <0.001 63.5 65.6 male
74.9 0.54 139 <0.001 73.8 75.9 female
Columns: rowid, estimate, std.error, statistic, p.value, conf.low, conf.high, y, sex
Assume a linear regression set up with a single categorical variable, \(G\), with three groups. We fit
\[ E(Y|G) = \beta_0 + \beta_1g_2 + \beta_2g_3, \]
where \(g_2\) and \(g_3\) are dummy variables representing membership in groups 2 and 3 respectively (\(g_{2i} = 1\) if observation \(i\) has \(G = 2\), and equal to 0 otherwise.) Since \(g_1\) is not found in the model, it is the reference category.
Therefore, \(\beta_0\) represents the average response among the reference category \(G = 1\). \(\beta_1\) represents the difference in the average response between groups \(G = 1\) and \(G = 2\). Therefore, \(\beta_0 + \beta_1\) is the average response in group \(G = 2\). A similar argument can be made about \(\beta_2\) and group 3.
. regress y i.sex##i.class
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(5, 2994) = 92.91
Model | 186898.199 5 37379.6398 Prob > F = 0.0000
Residual | 1204534.81 2,994 402.316235 R-squared = 0.1343
-------------+---------------------------------- Adj R-squared = 0.1329
Total | 1391433.01 2,999 463.965657 Root MSE = 20.058
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
sex |
Female | 21.62507 1.509999 14.32 0.000 18.66433 24.58581
|
class |
2 | 11.37444 1.573314 7.23 0.000 8.289552 14.45932
3 | 21.6188 1.588487 13.61 0.000 18.50416 24.73344
|
sex#class |
Female#2 | -4.851582 1.942744 -2.50 0.013 -8.66083 -1.042333
Female#3 | -6.084875 3.004638 -2.03 0.043 -11.97624 -.1935115
|
_cons | 50.6107 1.367932 37.00 0.000 47.92852 53.29288
------------------------------------------------------------------------------
. margins sex#class
Adjusted predictions Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
sex#class |
Male#1 | 50.6107 1.367932 37.00 0.000 47.92852 53.29288
Male#2 | 61.98514 .7772248 79.75 0.000 60.46119 63.50908
Male#3 | 72.2295 .8074975 89.45 0.000 70.64619 73.8128
Female#1 | 72.23577 .63942 112.97 0.000 70.98203 73.48952
Female#2 | 78.75863 .9434406 83.48 0.000 76.90877 80.60849
Female#3 | 87.7697 2.468947 35.55 0.000 82.92869 92.6107
------------------------------------------------------------------------------
emmeans)
Call:
lm(formula = y ~ sex * class, data = m)
Residuals:
Min 1Q Median 3Q Max
-72.029 -13.285 0.464 13.741 67.964
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.611 1.368 36.998 < 2e-16 ***
sexfemale 21.625 1.510 14.321 < 2e-16 ***
class2 11.374 1.573 7.230 6.12e-13 ***
class3 21.619 1.588 13.610 < 2e-16 ***
sexfemale:class2 -4.852 1.943 -2.497 0.0126 *
sexfemale:class3 -6.085 3.005 -2.025 0.0429 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.06 on 2994 degrees of freedom
Multiple R-squared: 0.1343, Adjusted R-squared: 0.1329
F-statistic: 92.91 on 5 and 2994 DF, p-value: < 2.2e-16
class sex emmean SE df lower.CL upper.CL
1 male 50.6 1.368 2994 47.9 53.3
2 male 62.0 0.777 2994 60.5 63.5
3 male 72.2 0.807 2994 70.6 73.8
1 female 72.2 0.639 2994 71.0 73.5
2 female 78.8 0.943 2994 76.9 80.6
3 female 87.8 2.469 2994 82.9 92.6
Confidence level used: 0.95
class sex emmean SE df t.ratio p.value
1 male 50.6 1.368 2994 36.998 <.0001
2 male 62.0 0.777 2994 79.752 <.0001
3 male 72.2 0.807 2994 89.449 <.0001
1 female 72.2 0.639 2994 112.971 <.0001
2 female 78.8 0.943 2994 83.480 <.0001
3 female 87.8 2.469 2994 35.549 <.0001
ggeffects)
Call:
lm(formula = y ~ sex * class, data = m)
Residuals:
Min 1Q Median 3Q Max
-72.029 -13.285 0.464 13.741 67.964
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.611 1.368 36.998 < 2e-16 ***
sexfemale 21.625 1.510 14.321 < 2e-16 ***
class2 11.374 1.573 7.230 6.12e-13 ***
class3 21.619 1.588 13.610 < 2e-16 ***
sexfemale:class2 -4.852 1.943 -2.497 0.0126 *
sexfemale:class3 -6.085 3.005 -2.025 0.0429 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.06 on 2994 degrees of freedom
Multiple R-squared: 0.1343, Adjusted R-squared: 0.1329
F-statistic: 92.91 on 5 and 2994 DF, p-value: < 2.2e-16
# Predicted values of y
# sex = male
class | Predicted | 95% CI
----------------------------------
1 | 50.61 | [47.93, 53.29]
2 | 61.99 | [60.46, 63.51]
3 | 72.23 | [70.65, 73.81]
# sex = female
class | Predicted | 95% CI
----------------------------------
1 | 72.24 | [70.98, 73.49]
2 | 78.76 | [76.91, 80.61]
3 | 87.77 | [82.93, 92.61]
Consider the simplest case, with two binary variables \(Z\) and \(K\). We fit the model
\[ E(Y|Z,K) = \beta_0 + \beta_1Z + \beta_2K + \beta_3ZK, \]
where \(ZK = 1\) only if both \(Z = 1\) and \(K = 1\).
This is functionally equivalent to defining a new variable \(L\),
\[ L = \begin{cases} 0, & K = 0\ \&\ Z = 0 \\ 1, & K = 0\ \&\ Z = 1 \\ 2, & K = 1\ \&\ Z = 0 \\ 3, & K = 1\ \&\ Z = 1, \end{cases} \]
and fitting the model
\[ E(Y|L) = \beta_0 + \beta_1l_1 + \beta_2l_2 + \beta_3l_3. \]
. regress y i.sex c.age c.distance
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(3, 2996) = 143.27
Model | 174576.269 3 58192.0896 Prob > F = 0.0000
Residual | 1216856.74 2,996 406.16046 R-squared = 0.1255
-------------+---------------------------------- Adj R-squared = 0.1246
Total | 1391433.01 2,999 463.965657 Root MSE = 20.153
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
sex |
Female | 13.98753 .7768541 18.01 0.000 12.46431 15.51075
age | -.5038264 .0336538 -14.97 0.000 -.5698133 -.4378394
distance | -.0060725 .0020291 -2.99 0.003 -.0100511 -.0020939
_cons | 83.13802 1.327228 62.64 0.000 80.53565 85.74039
------------------------------------------------------------------------------
. margins, at(age = (30 40))
Predictive margins Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
1._at: age = 30
2._at: age = 40
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
_at |
1 | 74.67056 .4941029 151.12 0.000 73.70175 75.63938
2 | 69.6323 .3680117 189.21 0.000 68.91072 70.35388
------------------------------------------------------------------------------
. margins, at(age = (30 40) distance = (10 100))
Predictive margins Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
1._at: age = 30
distance = 10
2._at: age = 30
distance = 100
3._at: age = 40
distance = 10
4._at: age = 40
distance = 100
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
_at |
1 | 74.9656 .5034959 148.89 0.000 73.97837 75.95283
2 | 74.41907 .5014946 148.39 0.000 73.43576 75.40238
3 | 69.92733 .3809974 183.54 0.000 69.18029 70.67438
4 | 69.38081 .3774763 183.80 0.000 68.64067 70.12095
------------------------------------------------------------------------------
. margins sex, at(age = (30 40))
Predictive margins Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
1._at: age = 30
2._at: age = 40
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
_at#sex |
1#Male | 67.66747 .5597763 120.88 0.000 66.56989 68.76506
1#Female | 81.655 .6902601 118.30 0.000 80.30157 83.00843
2#Male | 62.62921 .5370234 116.62 0.000 61.57624 63.68218
2#Female | 76.61674 .5331296 143.71 0.000 75.5714 77.66208
------------------------------------------------------------------------------
emmeans)Note that any variable specified in the at option must
also appear in the formula. Any variables you place in the formula but
not at will be examined at each unique level (so don’t put
a continuous variable there!).
Call:
lm(formula = y ~ sex + age + distance, data = m)
Residuals:
Min 1Q Median 3Q Max
-72.356 -13.792 -0.275 13.624 74.950
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 83.138023 1.327228 62.640 < 2e-16 ***
sexfemale 13.987531 0.776854 18.005 < 2e-16 ***
age -0.503826 0.033654 -14.971 < 2e-16 ***
distance -0.006073 0.002029 -2.993 0.00279 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.15 on 2996 degrees of freedom
Multiple R-squared: 0.1255, Adjusted R-squared: 0.1246
F-statistic: 143.3 on 3 and 2996 DF, p-value: < 2.2e-16
age emmean SE df lower.CL upper.CL
30 74.7 0.494 2996 73.7 75.6
40 69.6 0.368 2996 68.9 70.3
Results are averaged over the levels of: sex
Confidence level used: 0.95
age emmean SE df t.ratio p.value
30 74.7 0.494 2996 151.138 <.0001
40 69.6 0.368 2996 189.185 <.0001
Results are averaged over the levels of: sex
age distance emmean SE df lower.CL upper.CL
30 10 75.0 0.503 2996 74.0 75.9
40 10 69.9 0.381 2996 69.2 70.7
30 100 74.4 0.501 2996 73.4 75.4
40 100 69.4 0.377 2996 68.6 70.1
Results are averaged over the levels of: sex
Confidence level used: 0.95
age sex emmean SE df lower.CL upper.CL
30 male 67.7 0.560 2996 66.6 68.8
40 male 62.6 0.537 2996 61.6 63.7
30 female 81.7 0.690 2996 80.3 83.0
40 female 76.6 0.533 2996 75.6 77.7
Confidence level used: 0.95
ggeffects)Note the particular syntax for specifying particular values - a
quoted string containing the name of the variable, followed by a list in
square brackets of comma-separated values (a single value would be just
[4]). Whitespace is ignored.
Call:
lm(formula = y ~ sex + age + distance, data = m)
Residuals:
Min 1Q Median 3Q Max
-72.356 -13.792 -0.275 13.624 74.950
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 83.138023 1.327228 62.640 < 2e-16 ***
sexfemale 13.987531 0.776854 18.005 < 2e-16 ***
age -0.503826 0.033654 -14.971 < 2e-16 ***
distance -0.006073 0.002029 -2.993 0.00279 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.15 on 2996 degrees of freedom
Multiple R-squared: 0.1255, Adjusted R-squared: 0.1246
F-statistic: 143.3 on 3 and 2996 DF, p-value: < 2.2e-16
# Predicted values of y
age | Predicted | 95% CI
--------------------------------
30 | 74.67 | [73.70, 75.64]
40 | 69.63 | [68.91, 70.35]
# Predicted values of y
# distance = 10
age | Predicted | 95% CI
--------------------------------
30 | 74.97 | [73.98, 75.95]
40 | 69.93 | [69.18, 70.67]
# distance = 100
age | Predicted | 95% CI
--------------------------------
30 | 74.42 | [73.44, 75.40]
40 | 69.38 | [68.64, 70.12]
# Predicted values of y
# age = 30
sex | Predicted | 95% CI
-----------------------------------
male | 67.67 | [66.57, 68.77]
female | 81.66 | [80.30, 83.01]
# age = 40
sex | Predicted | 95% CI
-----------------------------------
male | 62.63 | [61.58, 63.68]
female | 76.62 | [75.57, 77.66]
Consider a model with a single continuous predictor, plus a control variable (which may be continuous or categorical).
\[ E(Y|X, Z) = \beta_0 + \beta_1X + \beta_2Z. \]
For any given value of \(X\), it is easy to compute the marginal mean for a given individual. For example,
\[ E(Y|X = 2, Z) = \beta_0 + 2\beta_1 + \beta_2Z. \]
Therefore we can easily compute \(E(y_i|x = 2, z = z_i)\) for each individual (the predicted outcome if each individual had a value of \(X = 2\), with their other values at the observed value) and average them.
. regress y i.sex##c.age
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(3, 2996) = 139.91
Model | 170983.675 3 56994.5583 Prob > F = 0.0000
Residual | 1220449.33 2,996 407.35959 R-squared = 0.1229
-------------+---------------------------------- Adj R-squared = 0.1220
Total | 1391433.01 2,999 463.965657 Root MSE = 20.183
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
sex |
Female | 14.92308 2.789012 5.35 0.000 9.454508 20.39165
age | -.4929608 .0480944 -10.25 0.000 -.5872622 -.3986595
|
sex#c.age |
Female | -.0224116 .0674167 -0.33 0.740 -.1545994 .1097762
|
_cons | 82.36936 1.812958 45.43 0.000 78.8146 85.92413
------------------------------------------------------------------------------
. margins, at(age = (30 40))
Predictive margins Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
1._at: age = 30
2._at: age = 40
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
_at |
1 | 74.71541 .5089298 146.81 0.000 73.71752 75.71329
2 | 69.67359 .3890273 179.10 0.000 68.9108 70.43638
------------------------------------------------------------------------------
. margins sex, at(age = (30 40))
Adjusted predictions Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
1._at: age = 30
2._at: age = 40
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
_at#sex |
1#Male | 67.58054 .5984013 112.94 0.000 66.40722 68.75386
1#Female | 81.83127 .8228617 99.45 0.000 80.21784 83.4447
2#Male | 62.65093 .5541361 113.06 0.000 61.56441 63.73746
2#Female | 76.67755 .5461908 140.39 0.000 75.6066 77.74849
------------------------------------------------------------------------------
emmeans)
Call:
lm(formula = y ~ sex * age, data = m)
Residuals:
Min 1Q Median 3Q Max
-71.817 -13.848 -0.116 13.758 75.263
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.36936 1.81296 45.434 < 2e-16 ***
sexfemale 14.92308 2.78901 5.351 9.42e-08 ***
age -0.49296 0.04809 -10.250 < 2e-16 ***
sexfemale:age -0.02241 0.06742 -0.332 0.74
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.18 on 2996 degrees of freedom
Multiple R-squared: 0.1229, Adjusted R-squared: 0.122
F-statistic: 139.9 on 3 and 2996 DF, p-value: < 2.2e-16
age emmean SE df lower.CL upper.CL
30 74.7 0.509 2996 73.7 75.7
40 69.7 0.389 2996 68.9 70.4
Results are averaged over the levels of: sex
Confidence level used: 0.95
age emmean SE df t.ratio p.value
30 74.7 0.509 2996 146.851 <.0001
40 69.7 0.389 2996 179.070 <.0001
Results are averaged over the levels of: sex
age sex emmean SE df t.ratio p.value
30 male 67.6 0.598 2996 112.935 <.0001
40 male 62.7 0.554 2996 113.061 <.0001
30 female 81.8 0.823 2996 99.447 <.0001
40 female 76.7 0.546 2996 140.386 <.0001
ggeffects)
Call:
lm(formula = y ~ sex * age, data = m)
Residuals:
Min 1Q Median 3Q Max
-71.817 -13.848 -0.116 13.758 75.263
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.36936 1.81296 45.434 < 2e-16 ***
sexfemale 14.92308 2.78901 5.351 9.42e-08 ***
age -0.49296 0.04809 -10.250 < 2e-16 ***
sexfemale:age -0.02241 0.06742 -0.332 0.74
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.18 on 2996 degrees of freedom
Multiple R-squared: 0.1229, Adjusted R-squared: 0.122
F-statistic: 139.9 on 3 and 2996 DF, p-value: < 2.2e-16
# Predicted values of y
age | Predicted | 95% CI
--------------------------------
30 | 74.72 | [73.72, 75.71]
40 | 69.67 | [68.91, 70.44]
# Predicted values of y
# age = 30
sex | Predicted | 95% CI
-----------------------------------
male | 67.58 | [66.41, 68.75]
female | 81.83 | [80.22, 83.44]
# age = 40
sex | Predicted | 95% CI
-----------------------------------
male | 62.65 | [61.56, 63.74]
female | 76.68 | [75.61, 77.75]
. regress y i.sex c.age
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(2, 2997) = 209.88
Model | 170938.657 2 85469.3283 Prob > F = 0.0000
Residual | 1220494.35 2,997 407.238688 R-squared = 0.1229
-------------+---------------------------------- Adj R-squared = 0.1223
Total | 1391433.01 2,999 463.965657 Root MSE = 20.18
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
sex |
Female | 14.03271 .7777377 18.04 0.000 12.50775 15.55766
age | -.5043666 .033698 -14.97 0.000 -.5704402 -.4382931
_cons | 82.78115 1.323613 62.54 0.000 80.18586 85.37643
------------------------------------------------------------------------------
. margins, dydx(age)
Average marginal effects Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
dy/dx wrt: age
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
age | -.5043666 .033698 -14.97 0.000 -.5704402 -.4382931
------------------------------------------------------------------------------
emmeans)
Call:
lm(formula = y ~ sex + age, data = m)
Residuals:
Min 1Q Median 3Q Max
-71.99 -13.88 -0.15 13.76 75.28
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.7811 1.3236 62.54 <2e-16 ***
sexfemale 14.0327 0.7777 18.04 <2e-16 ***
age -0.5044 0.0337 -14.97 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.18 on 2997 degrees of freedom
Multiple R-squared: 0.1229, Adjusted R-squared: 0.1223
F-statistic: 209.9 on 2 and 2997 DF, p-value: < 2.2e-16
1 age.trend SE df lower.CL upper.CL
overall -0.504 0.0337 2997 -0.57 -0.438
Results are averaged over the levels of: sex
Confidence level used: 0.95
To attach a p-value, wrap the call in the test function:
1 age.trend SE df t.ratio p.value
overall -0.504 0.0337 2997 -14.967 <.0001
Results are averaged over the levels of: sex
The Stata margins command to obtain marginal means
includes the “dydx” option, which points to derivative -
and indeed, this is exactly what is computed. If we have a basic
regression model,
\[ E(Y|X) = \beta_0 + \beta_1X \]
taking the derivative with respect to \(X\) will obtain \(\beta_1\), which is the estimated coefficient.
If \(X\) enters the model in a more complex way, say a polynomial term:
\[ E(Y|X) = \beta_0 + \beta_1X + \beta_2X^2 \]
now the derivative is \(\beta_1 + 2\beta_2X\). Similar to marginal means, where the predicted outcome was estimated for each individual then those outcomes averaged, here the derivative is estimated plugging in each observation’s value of \(X\), then averaged.
. regress y i.sex##c.age
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(3, 2996) = 139.91
Model | 170983.675 3 56994.5583 Prob > F = 0.0000
Residual | 1220449.33 2,996 407.35959 R-squared = 0.1229
-------------+---------------------------------- Adj R-squared = 0.1220
Total | 1391433.01 2,999 463.965657 Root MSE = 20.183
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
sex |
Female | 14.92308 2.789012 5.35 0.000 9.454508 20.39165
age | -.4929608 .0480944 -10.25 0.000 -.5872622 -.3986595
|
sex#c.age |
Female | -.0224116 .0674167 -0.33 0.740 -.1545994 .1097762
|
_cons | 82.36936 1.812958 45.43 0.000 78.8146 85.92413
------------------------------------------------------------------------------
. margins sex, dydx(age)
Average marginal effects Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
dy/dx wrt: age
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
age |
sex |
Male | -.4929608 .0480944 -10.25 0.000 -.5872622 -.3986595
Female | -.5153724 .0472435 -10.91 0.000 -.6080054 -.4227395
------------------------------------------------------------------------------
emmeans)
Call:
lm(formula = y ~ sex * age, data = m)
Residuals:
Min 1Q Median 3Q Max
-71.817 -13.848 -0.116 13.758 75.263
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.36936 1.81296 45.434 < 2e-16 ***
sexfemale 14.92308 2.78901 5.351 9.42e-08 ***
age -0.49296 0.04809 -10.250 < 2e-16 ***
sexfemale:age -0.02241 0.06742 -0.332 0.74
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.18 on 2996 degrees of freedom
Multiple R-squared: 0.1229, Adjusted R-squared: 0.122
F-statistic: 139.9 on 3 and 2996 DF, p-value: < 2.2e-16
sex age.trend SE df lower.CL upper.CL
male -0.493 0.0481 2996 -0.587 -0.399
female -0.515 0.0472 2996 -0.608 -0.423
Confidence level used: 0.95
sex age.trend SE df t.ratio p.value
male -0.493 0.0481 2996 -10.250 <.0001
female -0.515 0.0472 2996 -10.909 <.0001
Let’s assume we have a binary variable, \(Z\), interacted with a continuous variable, \(X\).
\[ E(Y|X,Z) = \beta_0 + \beta_1X + \beta_2Z + \beta_3XZ \]
Here we’re combining the math from marginal effects with marginal slopes. First, we generate two equations, one for \(Z = 0\) and one for \(Z = 1\):
\[\begin{align*} E(Y|X, Z = 0) & = \beta_0 + \beta_1X \\ E(Y|X, Z = 1) & = \beta_0 + \beta_1X + \beta_2 + \beta_3X = (\beta_0 + \beta_2) + (\beta_1 + \beta_3)X. \end{align*}\]
Then when we differentiate each with respect to \(X\), we obtain \(\beta_1\) and \((\beta_1 + \beta_3)\) respectively.
. regress y i.class
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(2, 2997) = 15.48
Model | 14229.0349 2 7114.51743 Prob > F = 0.0000
Residual | 1377203.97 2,997 459.527518 R-squared = 0.0102
-------------+---------------------------------- Adj R-squared = 0.0096
Total | 1391433.01 2,999 463.965657 Root MSE = 21.437
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
class |
2 | .4084991 .8912269 0.46 0.647 -1.338979 2.155977
3 | 5.373138 1.027651 5.23 0.000 3.358165 7.38811
|
_cons | 68.35805 .6190791 110.42 0.000 67.14419 69.57191
------------------------------------------------------------------------------
. margins class, pwcompare(pv)
Pairwise comparisons of adjusted predictions Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
-----------------------------------------------------
| Delta-method Unadjusted
| Contrast std. err. t P>|t|
-------------+---------------------------------------
class |
2 vs 1 | .4084991 .8912269 0.46 0.647
3 vs 1 | 5.373138 1.027651 5.23 0.000
3 vs 2 | 4.964638 1.041073 4.77 0.000
-----------------------------------------------------
emmeans)
Call:
lm(formula = y ~ class, data = m)
Residuals:
Min 1Q Median 3Q Max
-73.531 -14.758 0.101 14.536 72.569
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 68.3580 0.6191 110.419 < 2e-16 ***
class2 0.4085 0.8912 0.458 0.647
class3 5.3731 1.0277 5.229 1.83e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.44 on 2997 degrees of freedom
Multiple R-squared: 0.01023, Adjusted R-squared: 0.009566
F-statistic: 15.48 on 2 and 2997 DF, p-value: 2.045e-07
contrast estimate SE df t.ratio p.value
class1 - class2 -0.408 0.891 2997 -0.458 0.6467
class1 - class3 -5.373 1.028 2997 -5.229 <.0001
class2 - class3 -4.965 1.041 2997 -4.769 <.0001
The adjust = "none" argument skips any multiple
comparisons correction. This is done to obtain the same results as the
regression coefficients. If you’d prefer a more ANOVA post-hoc approach,
there are other options for adjust, see
?emmeans::contrast for details.
. regress y i.class##i.sex
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(5, 2994) = 92.91
Model | 186898.199 5 37379.6398 Prob > F = 0.0000
Residual | 1204534.81 2,994 402.316235 R-squared = 0.1343
-------------+---------------------------------- Adj R-squared = 0.1329
Total | 1391433.01 2,999 463.965657 Root MSE = 20.058
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
class |
2 | 11.37444 1.573314 7.23 0.000 8.289552 14.45932
3 | 21.6188 1.588487 13.61 0.000 18.50416 24.73344
|
sex |
Female | 21.62507 1.509999 14.32 0.000 18.66433 24.58581
|
class#sex |
2#Female | -4.851582 1.942744 -2.50 0.013 -8.66083 -1.042333
3#Female | -6.084875 3.004638 -2.03 0.043 -11.97624 -.1935115
|
_cons | 50.6107 1.367932 37.00 0.000 47.92852 53.29288
------------------------------------------------------------------------------
. margins class#sex, pwcompare(pv)
Pairwise comparisons of adjusted predictions Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
------------------------------------------------------------------
| Delta-method Unadjusted
| Contrast std. err. t P>|t|
--------------------------+---------------------------------------
class#sex |
(1#Female) vs (1#Male) | 21.62507 1.509999 14.32 0.000
(2#Male) vs (1#Male) | 11.37444 1.573314 7.23 0.000
(2#Female) vs (1#Male) | 28.14793 1.661722 16.94 0.000
(3#Male) vs (1#Male) | 21.6188 1.588487 13.61 0.000
(3#Female) vs (1#Male) | 37.159 2.822577 13.16 0.000
(2#Male) vs (1#Female) | -10.25064 1.006447 -10.18 0.000
(2#Female) vs (1#Female) | 6.522856 1.13971 5.72 0.000
(3#Male) vs (1#Female) | -.0062748 1.030005 -0.01 0.995
(3#Female) vs (1#Female) | 15.53392 2.550404 6.09 0.000
(2#Female) vs (2#Male) | 16.77349 1.222358 13.72 0.000
(3#Male) vs (2#Male) | 10.24436 1.120772 9.14 0.000
(3#Female) vs (2#Male) | 25.78456 2.588393 9.96 0.000
(3#Male) vs (2#Female) | -6.529131 1.241826 -5.26 0.000
(3#Female) vs (2#Female) | 9.011069 2.643063 3.41 0.001
(3#Female) vs (3#Male) | 15.5402 2.597644 5.98 0.000
------------------------------------------------------------------
This produces a lot of comparisons; if we only cared about the comparisons one way (e.g. gender differences within each class), we can restrict our output to these results.
. margins sex@class, contrast(pv nowald)
Contrasts of adjusted predictions Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
------------------------------------------------------------
| Delta-method
| Contrast std. err. t P>|t|
--------------------+---------------------------------------
sex@class |
(Female vs base) 1 | 21.62507 1.509999 14.32 0.000
(Female vs base) 2 | 16.77349 1.222358 13.72 0.000
(Female vs base) 3 | 15.5402 2.597644 5.98 0.000
------------------------------------------------------------
Note that the results are identical, it’s just a nicer output.
The pv option requests p-values instead of confidence
intervals; the nowald option suppresses an omnibus test of
any difference between classes (e.g. ANOVA output).
emmeans)
Call:
lm(formula = y ~ class * sex, data = m)
Residuals:
Min 1Q Median 3Q Max
-72.029 -13.285 0.464 13.741 67.964
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.611 1.368 36.998 < 2e-16 ***
class2 11.374 1.573 7.230 6.12e-13 ***
class3 21.619 1.588 13.610 < 2e-16 ***
sexfemale 21.625 1.510 14.321 < 2e-16 ***
class2:sexfemale -4.852 1.943 -2.497 0.0126 *
class3:sexfemale -6.085 3.005 -2.025 0.0429 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.06 on 2994 degrees of freedom
Multiple R-squared: 0.1343, Adjusted R-squared: 0.1329
F-statistic: 92.91 on 5 and 2994 DF, p-value: < 2.2e-16
contrast estimate SE df t.ratio p.value
male class1 - female class1 -21.62507 1.51 2994 -14.321 <.0001
male class1 - male class2 -11.37444 1.57 2994 -7.230 <.0001
male class1 - female class2 -28.14793 1.66 2994 -16.939 <.0001
male class1 - male class3 -21.61880 1.59 2994 -13.610 <.0001
male class1 - female class3 -37.15900 2.82 2994 -13.165 <.0001
female class1 - male class2 10.25064 1.01 2994 10.185 <.0001
female class1 - female class2 -6.52286 1.14 2994 -5.723 <.0001
female class1 - male class3 0.00627 1.03 2994 0.006 0.9951
female class1 - female class3 -15.53392 2.55 2994 -6.091 <.0001
male class2 - female class2 -16.77349 1.22 2994 -13.722 <.0001
male class2 - male class3 -10.24436 1.12 2994 -9.140 <.0001
male class2 - female class3 -25.78456 2.59 2994 -9.962 <.0001
female class2 - male class3 6.52913 1.24 2994 5.258 <.0001
female class2 - female class3 -9.01107 2.64 2994 -3.409 0.0007
male class3 - female class3 -15.54020 2.60 2994 -5.982 <.0001
The adjust = "none" argument skips any multiple
comparisons correction. This is done to obtain the same results as the
regression coefficients. If you’d prefer a more ANOVA post-hoc approach,
there are other options for adjust, see
?emmeans::contrast for details.
This produces a lot of comparisons; if we only cared about the comparisons one way (e.g. gender differences within each class), we can restrict our output to these results.
class = 1:
contrast estimate SE df t.ratio p.value
male - female -21.6 1.51 2994 -14.321 <.0001
class = 2:
contrast estimate SE df t.ratio p.value
male - female -16.8 1.22 2994 -13.722 <.0001
class = 3:
contrast estimate SE df t.ratio p.value
male - female -15.5 2.60 2994 -5.982 <.0001
Note that the results are identical, it’s just a nicer output.
. regress y i.class##c.age
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(5, 2994) = 21.15
Model | 47475.1592 5 9495.03183 Prob > F = 0.0000
Residual | 1343957.85 2,994 448.883716 R-squared = 0.0341
-------------+---------------------------------- Adj R-squared = 0.0325
Total | 1391433.01 2,999 463.965657 Root MSE = 21.187
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
class |
2 | -13.19784 3.880448 -3.40 0.001 -20.80646 -5.589225
3 | -11.14087 4.189539 -2.66 0.008 -19.35553 -2.926199
|
age | -.4751707 .0630779 -7.53 0.000 -.5988511 -.3514903
|
class#c.age |
2 | .2484541 .0887425 2.80 0.005 .0744516 .4224565
3 | .2903105 .1107395 2.62 0.009 .0731774 .5074437
|
_cons | 90.55752 3.009782 30.09 0.000 84.65606 96.45897
------------------------------------------------------------------------------
. margins class, dydx(age) pwcompare(pv)
Pairwise comparisons of average marginal effects
Model VCE: OLS Number of obs = 3,000
Expression: Linear prediction, predict()
dy/dx wrt: age
-----------------------------------------------------
| Contrast Delta-method Unadjusted
| dy/dx std. err. t P>|t|
-------------+---------------------------------------
age |
class |
2 vs 1 | .2484541 .0887425 2.80 0.005
3 vs 1 | .2903105 .1107395 2.62 0.009
3 vs 2 | .0418565 .1103668 0.38 0.705
-----------------------------------------------------
emmeans)
Call:
lm(formula = y ~ class * age, data = m)
Residuals:
Min 1Q Median 3Q Max
-75.335 -14.841 -0.036 14.469 73.169
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 90.55752 3.00978 30.088 < 2e-16 ***
class2 -13.19784 3.88045 -3.401 0.00068 ***
class3 -11.14087 4.18954 -2.659 0.00787 **
age -0.47517 0.06308 -7.533 6.52e-14 ***
class2:age 0.24845 0.08874 2.800 0.00515 **
class3:age 0.29031 0.11074 2.622 0.00880 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.19 on 2994 degrees of freedom
Multiple R-squared: 0.03412, Adjusted R-squared: 0.03251
F-statistic: 21.15 on 5 and 2994 DF, p-value: < 2.2e-16
contrast estimate SE df t.ratio p.value
class1 - class2 -0.2485 0.0887 2994 -2.800 0.0051
class1 - class3 -0.2903 0.1107 2994 -2.622 0.0088
class2 - class3 -0.0419 0.1104 2994 -0.379 0.7045
. regress y i.class##i.sex
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(5, 2994) = 92.91
Model | 186898.199 5 37379.6398 Prob > F = 0.0000
Residual | 1204534.81 2,994 402.316235 R-squared = 0.1343
-------------+---------------------------------- Adj R-squared = 0.1329
Total | 1391433.01 2,999 463.965657 Root MSE = 20.058
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
class |
2 | 11.37444 1.573314 7.23 0.000 8.289552 14.45932
3 | 21.6188 1.588487 13.61 0.000 18.50416 24.73344
|
sex |
Female | 21.62507 1.509999 14.32 0.000 18.66433 24.58581
|
class#sex |
2#Female | -4.851582 1.942744 -2.50 0.013 -8.66083 -1.042333
3#Female | -6.084875 3.004638 -2.03 0.043 -11.97624 -.1935115
|
_cons | 50.6107 1.367932 37.00 0.000 47.92852 53.29288
------------------------------------------------------------------------------
. margins sex@class, contrast(pv nowald)
Contrasts of adjusted predictions Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
------------------------------------------------------------
| Delta-method
| Contrast std. err. t P>|t|
--------------------+---------------------------------------
sex@class |
(Female vs base) 1 | 21.62507 1.509999 14.32 0.000
(Female vs base) 2 | 16.77349 1.222358 13.72 0.000
(Female vs base) 3 | 15.5402 2.597644 5.98 0.000
------------------------------------------------------------
. margins class, dydx(sex) pwcompare(pv)
Pairwise comparisons of conditional marginal effects
Model VCE: OLS Number of obs = 3,000
Expression: Linear prediction, predict()
dy/dx wrt: 1.sex
-----------------------------------------------------
| Contrast Delta-method Unadjusted
| dy/dx std. err. t P>|t|
-------------+---------------------------------------
0.sex | (base outcome)
-------------+---------------------------------------
1.sex |
class |
2 vs 1 | -4.851582 1.942744 -2.50 0.013
3 vs 1 | -6.084875 3.004638 -2.03 0.043
3 vs 2 | -1.233294 2.870873 -0.43 0.668
-----------------------------------------------------
Note: dy/dx for factor levels is the discrete change
from the base level.
Note that this works because sex is binary, thus it’s
coefficient (obtained via dydx) is for a 1-unit change,
from 0 to 1. If you had a more complicated situation (e.g. there was a
third gender category but you only wanted to compare male versus
female), you could instead manually define your contrast.
. margins {sex +1 -1}#{class +1 -1 0}, contrast(pv nowald)
Contrasts of adjusted predictions Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
-----------------------------------------------------
| Delta-method
| Contrast std. err. t P>|t|
-------------+---------------------------------------
sex#class |
(1) (1) | -4.851582 1.942744 -2.50 0.013
-----------------------------------------------------
. margins {sex +1 -1}#{class +1 -1 0} {sex +1 -1}#{class +1 0 -1} {sex +1 -1}#{
> class 0 +1 -1}, contrast(pv nowald)
Contrasts of adjusted predictions Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
-----------------------------------------------------
| Delta-method
| Contrast std. err. t P>|t|
-------------+---------------------------------------
sex#class |
(1) (1) | -4.851582 1.942744 -2.50 0.013
(1) (2) | -6.084875 3.004638 -2.03 0.043
(1) (3) | -1.233294 2.870873 -0.43 0.668
-----------------------------------------------------
Here the {variable # # #} notation defines a contrast -
the values correspond to a linear equation for the levels. E.g. if we
wanted to test male = female, or
male - female = 0, we could write this as
(+1)male + (-1)female = 0, therefore the contrast is
+1 -1 (in sex, male is the lowest level,
female is the higher level). For class, we want to compare each pair of
classs, so class1 = class2 is equivalent to
(+1)class1 + (-1)class2 + (0)class3 = 0, whose contrast is
+1 -1 0.
If you wanted a diff-in-diff-in-diff estimate (e.g. say you’ve got
males and females, and some undergo each of treatment 0 and treatment 1,
and you want to test whether the difference in pre-post changes amongst
men is significantly different than the difference in pre-post changes
amongst women), you can run something like
margins sex#treatment, dydx(prepost) contrast(pv nowald).
emmeans)
Call:
lm(formula = y ~ class * sex, data = m)
Residuals:
Min 1Q Median 3Q Max
-72.029 -13.285 0.464 13.741 67.964
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.611 1.368 36.998 < 2e-16 ***
class2 11.374 1.573 7.230 6.12e-13 ***
class3 21.619 1.588 13.610 < 2e-16 ***
sexfemale 21.625 1.510 14.321 < 2e-16 ***
class2:sexfemale -4.852 1.943 -2.497 0.0126 *
class3:sexfemale -6.085 3.005 -2.025 0.0429 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.06 on 2994 degrees of freedom
Multiple R-squared: 0.1343, Adjusted R-squared: 0.1329
F-statistic: 92.91 on 5 and 2994 DF, p-value: < 2.2e-16
class = 1:
contrast estimate SE df t.ratio p.value
male - female -21.6 1.51 2994 -14.321 <.0001
class = 2:
contrast estimate SE df t.ratio p.value
male - female -16.8 1.22 2994 -13.722 <.0001
class = 3:
contrast estimate SE df t.ratio p.value
male - female -15.5 2.60 2994 -5.982 <.0001
contrast estimate SE df t.ratio
(male - female class1) - (male - female class2) -4.85 1.94 2994 -2.497
(male - female class1) - (male - female class3) -6.08 3.00 2994 -2.025
(male - female class2) - (male - female class3) -1.23 2.87 2994 -0.430
p.value
0.0126
0.0429
0.6675
. regress y i.class##c.age
Source | SS df MS Number of obs = 3,000
-------------+---------------------------------- F(5, 2994) = 21.15
Model | 47475.1592 5 9495.03183 Prob > F = 0.0000
Residual | 1343957.85 2,994 448.883716 R-squared = 0.0341
-------------+---------------------------------- Adj R-squared = 0.0325
Total | 1391433.01 2,999 463.965657 Root MSE = 21.187
------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
class |
2 | -13.19784 3.880448 -3.40 0.001 -20.80646 -5.589225
3 | -11.14087 4.189539 -2.66 0.008 -19.35553 -2.926199
|
age | -.4751707 .0630779 -7.53 0.000 -.5988511 -.3514903
|
class#c.age |
2 | .2484541 .0887425 2.80 0.005 .0744516 .4224565
3 | .2903105 .1107395 2.62 0.009 .0731774 .5074437
|
_cons | 90.55752 3.009782 30.09 0.000 84.65606 96.45897
------------------------------------------------------------------------------
. margins class, at(age = (20(10)60))
Adjusted predictions Number of obs = 3,000
Model VCE: OLS
Expression: Linear prediction, predict()
1._at: age = 20
2._at: age = 30
3._at: age = 40
4._at: age = 50
5._at: age = 60
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
_at#class |
1 1 | 81.0541 1.793005 45.21 0.000 77.53845 84.56975
1 2 | 72.82534 1.284642 56.69 0.000 70.30647 75.34421
1 3 | 75.71945 1.27105 59.57 0.000 73.22723 78.21167
2 1 | 76.30239 1.219243 62.58 0.000 73.91176 78.69303
2 2 | 70.55818 .8030164 87.87 0.000 68.98366 72.1327
2 3 | 73.87085 .8136044 90.79 0.000 72.27557 75.46613
3 1 | 71.55069 .744313 96.13 0.000 70.09127 73.0101
3 2 | 68.29101 .6470303 105.55 0.000 67.02234 69.55968
3 3 | 72.02224 1.168425 61.64 0.000 69.73125 74.31324
4 1 | 66.79898 .6459221 103.42 0.000 65.53249 68.06548
4 2 | 66.02384 .9857706 66.98 0.000 64.09099 67.9567
4 3 | 70.17364 1.93012 36.36 0.000 66.38915 73.95814
5 1 | 62.04727 1.037397 59.81 0.000 60.01319 64.08136
5 2 | 63.75668 1.517933 42.00 0.000 60.78038 66.73298
5 3 | 68.32504 2.782515 24.56 0.000 62.86921 73.78088
------------------------------------------------------------------------------
. marginsplot, recastci(rarea) ciopts(fcolor(%25) lcolor(%25))
Variables that uniquely identify margins: age class
The recastci(rarea) option plots the confidence region,
rather than just confident intervals at each estimated point (to more
closely mimic R’s output). The ciopts() allows you to pass
options for the confidence intervals; these particular options just make
the confidence region (and it’s outline) transparent.
interactions)
Call:
lm(formula = y ~ class * age, data = m)
Residuals:
Min 1Q Median 3Q Max
-75.335 -14.841 -0.036 14.469 73.169
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 90.55752 3.00978 30.088 < 2e-16 ***
class2 -13.19784 3.88045 -3.401 0.00068 ***
class3 -11.14087 4.18954 -2.659 0.00787 **
age -0.47517 0.06308 -7.533 6.52e-14 ***
class2:age 0.24845 0.08874 2.800 0.00515 **
class3:age 0.29031 0.11074 2.622 0.00880 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.19 on 2994 degrees of freedom
Multiple R-squared: 0.03412, Adjusted R-squared: 0.03251
F-statistic: 21.15 on 5 and 2994 DF, p-value: < 2.2e-16
emmeans)
Call:
lm(formula = y ~ class * age, data = m)
Residuals:
Min 1Q Median 3Q Max
-75.335 -14.841 -0.036 14.469 73.169
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 90.55752 3.00978 30.088 < 2e-16 ***
class2 -13.19784 3.88045 -3.401 0.00068 ***
class3 -11.14087 4.18954 -2.659 0.00787 **
age -0.47517 0.06308 -7.533 6.52e-14 ***
class2:age 0.24845 0.08874 2.800 0.00515 **
class3:age 0.29031 0.11074 2.622 0.00880 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.19 on 2994 degrees of freedom
Multiple R-squared: 0.03412, Adjusted R-squared: 0.03251
F-statistic: 21.15 on 5 and 2994 DF, p-value: < 2.2e-16
ggeffects)
Call:
lm(formula = y ~ class * age, data = m)
Residuals:
Min 1Q Median 3Q Max
-75.335 -14.841 -0.036 14.469 73.169
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 90.55752 3.00978 30.088 < 2e-16 ***
class2 -13.19784 3.88045 -3.401 0.00068 ***
class3 -11.14087 4.18954 -2.659 0.00787 **
age -0.47517 0.06308 -7.533 6.52e-14 ***
class2:age 0.24845 0.08874 2.800 0.00515 **
class3:age 0.29031 0.11074 2.622 0.00880 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.19 on 2994 degrees of freedom
Multiple R-squared: 0.03412, Adjusted R-squared: 0.03251
F-statistic: 21.15 on 5 and 2994 DF, p-value: < 2.2e-16
. logit yc i.class, nolog or
Logistic regression Number of obs = 3,000
LR chi2(2) = 12.39
Prob > chi2 = 0.0020
Log likelihood = -1651.6397 Pseudo R2 = 0.0037
------------------------------------------------------------------------------
yc | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
class |
2 | .8231057 .0789757 -2.03 0.042 .6819996 .9934067
3 | .6747704 .0779699 -3.40 0.001 .5380213 .846277
|
_cons | .3718535 .0241593 -15.23 0.000 .327393 .4223519
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
Below, the references to each coefficient
(e.g. _b[2.class]) was obtained by calling
logit, coeflegend after the original logit
call.
. lincom _b[_cons], or
( 1) [yc]_cons = 0
------------------------------------------------------------------------------
yc | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
(1) | .3718535 .0241593 -15.23 0.000 .327393 .4223519
------------------------------------------------------------------------------
. lincom _b[_cons] + _b[2.class], or
( 1) [yc]2.class + [yc]_cons = 0
------------------------------------------------------------------------------
yc | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
(1) | .3060748 .0216103 -16.77 0.000 .2665193 .3515008
------------------------------------------------------------------------------
. lincom _b[_cons] + _b[3.class], or
( 1) [yc]3.class + [yc]_cons = 0
------------------------------------------------------------------------------
yc | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
(1) | .2509158 .0239763 -14.47 0.000 .2080613 .302597
------------------------------------------------------------------------------
. lincom _b[2.class], or
( 1) [yc]2.class = 0
------------------------------------------------------------------------------
yc | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
(1) | .8231057 .0789757 -2.03 0.042 .6819996 .9934067
------------------------------------------------------------------------------
. lincom _b[3.class], or
( 1) [yc]3.class = 0
------------------------------------------------------------------------------
yc | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
(1) | .6747704 .0779699 -3.40 0.001 .5380213 .846277
------------------------------------------------------------------------------
. lincom _b[3.class] - _b[2.class], or
( 1) - [yc]2.class + [yc]3.class = 0
------------------------------------------------------------------------------
yc | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
(1) | .8197858 .0973987 -1.67 0.094 .6494852 1.034741
------------------------------------------------------------------------------